\documentclass{article}
%\usepackage[a4paper,margin=2.2cm,footskip=.5cm]{geometry}
\usepackage{fullpage}
%-----------------Hyperlink Packages--------------------
\usepackage{hyperref}
\hypersetup{
	 colorlinks   = true,
     citecolor    = black,
     linkcolor    = black,
     urlcolor     = black
}
%-----------------Figure Packages--------------------
\usepackage{graphicx}                       % For figures
\usepackage{stfloats}
\usepackage[tight,footnotesize]{subfigure}  % Create subfigures, ie 1A, 1B
%\usepackage{epsfig} % for postscript graphics files
%------------------Math Packages------------------------
\usepackage{amssymb,amsmath}
\usepackage{textcomp}
\usepackage{mdwmath}
\usepackage{mdwtab}
\usepackage{eqparbox}
%------------------Table Packages-----------------------
\usepackage{rotating}                     % Used to rotate tables
\usepackage{array}                        % Fixed column widths for tables
%-----------------Algorithm Packages--------------------
\usepackage{listings}                     % Source code
\usepackage{algorithm}                    % Pseudo Code
\usepackage[noend]{algpseudocode}
%---------------------------------------------------------

%opening
\title{}

\begin{document}
\bibliographystyle{acm}

\title{
Artificial Intelligence Project 1 \\
Binary Image Denoising
}
\author{Class 1, Team 6}
\date{\today}
\maketitle

\section{Team Members}

\begin{table}[H]
\centering
\begin{tabular}{l l l}
Name & Student ID  & Job\\
\hline
Fan Ziyao & 12330081 & Team leader, implementation\\
Chen Yingcong & 12330049 & Implementation \\
Chen Xiongtao & 12330040 & Modeling, implementation \\
Huang Long & 12330132 & Implementation \\
Zhang Qiuyi & 12330402 & Implementation, documentation

\end{tabular}
\end{table}

\section{Problem Description}
\paragraph{}
Let the observed noisy image be described by an array of binary pixel values $y_i \in \{−1, +1\}$, where the index $i = 1, ..., D$ runs over all pixels. We shall suppose that the image is obtained by taking an unknown noise-free image, described by binary pixel values $x_i \in \{−1, +1\}$ and randomly flipping the sign of pixels with some small probability, say, $10\%$. Given the noisy image, our goal is to recover the original noise-free image.

\section{Modeling}
\paragraph{}
Knowing that the density of noise is small, there should be a strong correlation between $x_i$ and $y_i$. Another prior knowledge is that neighboring pixels $x_i$ and $x_j$ in an image are strongly correlated. This knowledge implies that $\{\mathbf{x}, \mathbf{y}\}$ has the Markov property and can be described with a undirected graph \cite{bishop2006pattern}, so we can use the Markov random field to model this problem.

There are two types of cliques in this graph. The first one $\{x_i, y_i\}$ has an associated energy function that expresses the correlation between these variables. Here we can use a simple energy function for them: $-\eta x_i y_i$, where $\eta$ is a positive constant. This will produce a lower energy (thus encouraging a higher probability) when $x_i$ and $y_i$ have the same sign and a higher energy when they have the opposite sign.

The other type of cliques is $\{x_i, x_j\}$ where $i$ and $j$ are indices of neighboring pixels. Again, we want the energy to be lower when the pixels have the same sign than when they have the opposite sign, and so we choose an energy given by $-\beta x_i x_j$ where $\beta$ is another positive constant. 

Because a potential function is an arbitrary, non-negative function over a maximal clique, we can multiply it by any nonnegative functions of subsets of the clique, or equivalently we can add the corresponding energies. In this model, this allows us to add an extra term $hx_i$ for each pixel $i$ in the noise-free image. This would bias the model towards pixel values that have one particular sign in preference to the other. Hence the complete energy function for this model is:

$$
E(\mathbf{x}, \mathbf{y}) = h \sum_{i}x_i - \beta\sum_{\{i, j\}}x_ix_j - \eta\sum_{i}x_iy_i
$$

which defines a joint distribution over x and y given by:

$$
p(\mathbf{x}, \mathbf{y}) = \frac{1}{Z}\exp\{-E(\mathbf{x}, \mathbf{y})\}
$$

where $Z$ is the \textit{partition function} \cite{bishop2006pattern}. Our goal is then defined as finding an $\mathbf{x}$ such that:

$$
\mathbf{x} = {\operatorname{arg\,min}} E(\mathbf{x}, \mathbf{y})
$$

\section{Algorithm and Implementation}

\subsection{Iterated Conditional Modes}
\paragraph{}
We now fix the elements of $y$ to the observed values given by the pixels of the noisy image, which implicitly defines a conditional distribution $p(x|y)$ over noise-free images. This is an example of the \textit{Ising model}, which has been widely studied in statistical physics. 

For this problem, we wish to find an image $x$ having a high probability (ideally the maximum probability), which is equivalent to making $E$ as small as possible. Since $E$ can be viewed as a function of $M \times N$ variables, with each variable representing a pixel value, we can use the method of gradient descent to search for the minimizer of $E$. More precisely, for each pixel $x_{i}$ we search by evaluating two possible states for $x_{i} = 1$ and $x_{i} = −1$, keeping all other pixel value fixed, and choose between the two states. The search will be repeated until some stopping criterion is satisfied.

A straightforward scheme would be to choose whichever state that leads to a lower energy at each step of the search. This greedy approach, described in Algorithm ~\ref{alg:icm}, is known as the \textit{iterated conditional modes}, or ICM. Despite being a simple and effective strategy, ICM is prone to local optima. Therefore, we need another approach for a better result.

\begin{algorithm}
\centering
\caption{Binary image denoising with iterated conditional modes}
\label{alg:icm}
  \begin{algorithmic}[1]
    \Function{Deniose}{$\mathbf{y}$, $\beta$, $\eta$, $h$}
        \Comment{$\mathbf{y}$ is the noisy image}
        \State Initialize $\mathbf{x}$ with $\mathbf{x} = \mathbf{y}$
        \State Initialize $E_{best}$ with $E_{best} = E(\mathbf{x}, \mathbf{y})$
	    \For{$k = 1 \to k_{max}$}
	    	\For{each pixel $x_{i}$}
	    		\State $E_1 = E(\mathbf{x}, \mathbf{y})$
	    		\State $x_{i} = - x_{i}$ \Comment{flip the pixel}
	    		\State $E_2 = E(\mathbf{x}, \mathbf{y})$
	    		\If{$E1 > E2$}
		    		\If{$E_2 < E_{best}$}
		    			\State Record the best energy $E_{best} = E_2$
		    		\EndIf
		    	\Else
		    		\State $x_{i} = - x_{i}$ \Comment{flip the pixel back}
		    	\EndIf
	    	\EndFor
	    \EndFor
      \Return $\mathbf{x}$
    \EndFunction
  \end{algorithmic}
\end{algorithm}

\subsection{Simulated Annealing}
\paragraph{}
It is easy to see that $E(x,y)$ is a non-convex function of $x$, which implies that there will be multiple local minima for $E$ depending on the initial state. To search for the global optimun, we need a global optimization strategy. In this project, we use \textit{simulated annealing}(SA), which can be easily integrated with method of gradient descent. By adding randomness to our searching strategy, we increase the probability of reaching a global optimum (and SA does approximate one in this model, as proven in \cite{geman1984stochastic}).The complete algorithm is described in Algorithm ~\ref{alg:sa}.

\begin{algorithm}
\centering
\caption{Binary image denoising with simulated annealing}
\label{alg:sa}
  \begin{algorithmic}[1]
    \Function{Deniose}{$\mathbf{y}$, $\beta$, $\eta$, $h$}
        \Comment{$\mathbf{y}$ is the noisy image}
        \State Initialize $\mathbf{x}$ with $\mathbf{x} = \mathbf{y}$
        \State Initialize $Ebest$ with $Ebest = E(\mathbf{x}, \mathbf{y})$
	    \For{$k = 1 \to k_{max}$}
	    	\State Compute the temperature $t$ = temperature($k,k_{max}$)
	    	\For{each pixel $x_{i}$}
	    		\State $E_1 = E(\mathbf{x}, \mathbf{y})$
	    		\State $x_{i} = - x_{i}$ \Comment{flip the pixel}
	    		\State $E_2 = E(\mathbf{x}, \mathbf{y})$
	    		\State Compute the transition probability $p = prob(E_1, E_2, t)$
	    		\If{$p > q$ where $q$ is a random number in $[0, 1]$}
		    		\If{$E_2 < E_{best}$}
		    			\State Record the best energy $Ebest = E_2$
		    		\EndIf
		    	\Else
		    		\State $x_{i} = - x_{i}$ \Comment{flip the pixel back}
		    	\EndIf
	    	\EndFor
	    \EndFor
      \Return $\mathbf{x}$
    \EndFunction
  \end{algorithmic}
\end{algorithm}

\begin{description}
\item Remark 1.\hfill \\
The temperature function is a decreasing function of iterations. It must converge to $0$ as $k \to k_{max}$. For this implementation we use

$$
\text{temperature}(k, k_{max}) = \frac{1}{500}(\frac{1}{k} - \frac{1}{k_{max}})
$$
\item Remark 2.\hfill \\
The probability transition function used for this implementation is
$$
prob(E_1, E_2, t) =  \left\{
     \begin{array}{lr}
       1, & E_1 > E_2 \\
       \mathbf{e}^{\frac{E_1 - E2}{t}}, & E_1 \leq E_2
     \end{array}
   \right.
$$
\end{description}

\subsection{Local Optimization}

\paragraph{}
With either ICM or SA, we only need to alter the $E$ for values relavent to the flipped pixel at each step. Let $x_i$ be the pixel to be flipped, we can just update $E$ with the new $hx_i + \eta x_i y_i + \sum \beta x_i x_j$ where $x_j$ are the neighboring pixels of $x_i$. Then we can localize the calculation of $E_1$ and $E_2$, making the implementation much faster.

\section{Experiment Result}

\paragraph{}
We first take a binary image(for simplicity we choose a black-and-white one), then flip its pixels with 10\% probability to generate the noisy image. After that, we try to restore the original image by running both ICM and SA denoising, and compare their results. For the energy function, we choose $\beta = 1 \times 10^{-4}, \eta = 2.1 \times 10^{-4}, h = 0$. The maximum number of iteration $k_{max}$ is $30$ for both ICM and SA.

The experiment results are shown in Figure ~\ref{fig:original} - \ref{fig:sa}.

\begin{figure}[H]
	\begin{minipage}[b]{0.48\linewidth}
		\centering
		\includegraphics[width=180pt]{../img/in.png}
		\caption{Original image}
		\label{fig:original}
	\end{minipage}
	\begin{minipage}[b]{0.48\linewidth}
		\centering
		\includegraphics[width=180pt]{../img/flipped.png}
		\caption{Noisy image with 10\% pixels flipped}
		\label{fig:noisy}
	\end{minipage}
\end{figure}
\begin{figure}[H]
	\centering
	\begin{minipage}[b]{0.45\linewidth}
		\centering
		\includegraphics[width=180pt]{../img/icm.png}
		\caption{Denoised with ICM}
		\label{fig:icm}
	\end{minipage}
	\begin{minipage}[b]{0.45\linewidth}
		\centering
		\includegraphics[width=180pt]{../img/best.png}
		\caption{Denoised with SA}
		\label{fig:sa}
	\end{minipage}
\end{figure}

\begin{figure}[H]
\centering
\includegraphics[width=300pt]{../img/ICM-energy-time.png}
\caption{Time-Energy series of ICM}
\end{figure}

\begin{figure}[H]
\centering
\includegraphics[width=300pt]{../img/SA-energy-time.png}
\caption{Time-Energy series of SA}
\end{figure}
To evaluate the denoised results, we count the non-zero elements in $\mathbf{x} - \mathbf{y}$. ICM produces an image with  96.21\% of the pixels agree with the original one, while SA yeilds 99.19\%.

\section{Discussion}
\paragraph{}
Both ICM and SA restored most of the original image, though SA is visually much better than ICM, which is not surprising -- after all, SA approximates the global optimum while ICM tends to stuck in local optima.

From a visual inspection of the images, it is apparent that SA successfully removed the majority of the outlying noises, especially those in the uniform areas. Since the neighborhood defined in $E$ is only 4-adjacent, SA can not capture the overall attributes of the image, such as the egdes. The randomness incorporated by SA also makes it uncertain about these noises. Therefore, in areas where the noises obscure the edges of the object, some pixels have been mistakenly flipped; in area with high density of noises, some of them have been clustered instead of being removed. Aside from these small artifacts, the result produced by SA is rather appealing.

\bibliography{denoise}
\end{document}
